The Saudi Novel Corpus: Design and Compilation
نویسندگان
چکیده
Arabic has recently received significant attention from corpus compilers. This situation led to the creation of many corpora that cover various genres, most notably newswire genre. Yet, novels, and specifically those authored by Saudi writers, lack sufficient digital datasets would enhance linguistic stylistic studies these works. Thus, lags behind English other European languages in this context. In paper, we present Novels Corpus, built be a valuable resource for research communities. We procedures followed decisions made creating corpus. describe clarify design criteria, data collection methods, process annotation, encoding. addition, preliminary results emerged analysis content. consider work described paper as initial steps bridge existing gap between linguistics literary texts. Further is planned improve quality adding advanced features.
منابع مشابه
Balanced corpus of informal spoken Czech: compilation, design and findings
The paper presents ORAL2008, a new 1-million corpus of spoken Czech compiled within the framework of the Czech National Corpus project. ORAL2008 is designed as a representation of authentic spoken language used in informal situations and it is balanced in the main sociolinguistic categories of speakers. The paper concentrates also on the data collection, its broad coverage and the transcription...
متن کاملDesign and compilation of a specialized Spanish-German parallel corpus
This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their tra...
متن کاملwuthering heights and the concept of marality/a sociological study of the novel
to discuss my point, i have collected quite a number of articles, anthologies, and books about "wuthering heights" applying various ideas and theories to this fantastic story. hence, i have come to believe that gadamer and jauss are rightful when they claim that "the individaul human mind is the center and origin of all meaning," 3 that reading literature is a reader-oriented activity, that it ...
15 صفحه اولCompilation and Exploitation of the IJS-ELAN Parallel Corpus
With more and more text being available in electronic form, it is becoming relatively easy to obtain digital texts together with their translations. The paper presents the processing steps necessary to compile such texts into parallel corpora, an extremely useful language resource. Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers...
متن کاملThe International Corpus of Arabic: Compilation, Analysis and Evaluation
This paper focuses on a project for building the first International Corpus of Arabic (ICA). It is planned to contain 100 million analyzed tokens with an interface which allows users to interact with the corpus data in a number of ways [ICA website]. ICA is a representative corpus of Arabic that has been initiated in 2006, it is intended to cover the Modern Standard Arabic (MSA) language as bei...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2022
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app12136648